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Multigrid meets neural nets 
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We present evidence that multigrid (MG) works for wave equations in disordered systems, e.g. in the presence 
of gauge fields, no matter how strong the disorder. We introduce a "neural computations" point of view into large 
scale simulations: First, the system must learn how to do the simulations efficiently, then do the simulation (fast). 
The method can also be used to provide smooth interpolation kernels which are needed in multigrid Monte Carlo 
updates. 



1. INTRODUCTION 

There is a stochastic multigrid method and 
a deterministic one. The stochastic version is 
used to compute high dimensional integrals in 
Euclidean quantum field theory or statistical me- 
chanics by a Monte Carlo method which uses up- 
dates at different length scales [|^,|[. The deter- 
ministic version Q solves discretized partial dif- 
ferential equations. One hopes to use both of 
them in simulations of lattice QCD, for updat- 
ing the gauge fields and for computing fermion 
propagators in given gauge fields. In either case 
the aim is to beat critical slowing down (CSD) in 
nearly critical systems. 

Our notation is as follows: A° denotes a given 
"fundamental" lattice A° of spacing ag. Coarser 
(block) lattices of increasing spacings aj = i^ao 
are denoted , , . . . , . Typically, we chose 
Lh — 2, and a single point as the last layer 
A^. Interpolation operators A-' are introduced 
to transfer functions on coarser lattices into func- 
tions on finer lattices, while restriction operators 
= A^* transfer functions from finer to coarser 
lattices ( "variational coarsening" ) . 



2. IMPORTANCE OF SMOOTHNESS 

A crucial problem is how to define and exhibit 
smooth functions in the disordered context, i.e. 
when translation symmetry is strongly violated. 
Other possible applications besides gauge theories 
are low lying states of spin glasses, the shape of 
a lightning, waves on fractal lattices (with bond 



percolation) , or the localization of low lying elec- 
tronic states in amorphous materials. 

In the case of deterministic MG, one wants to 
solve a discretized elliptic differential equation on 
AO: 



(1) 



It might have arisen from an eigenvalue equation 
Dq(^ = by inverse iteration. If has a 
small eigenvalue, then local relaxation algorithms 
suffer from CSD. After some relaxation sweeps 
on A" one gets an approximate solution ^ whose 
error = — ^ is not necessarily small but is 
smooth (on length scale ao). The unknown error 
gO satisfies the equation 



Doe" = r". 



(2) 



with the residual = — £)oC°- Given that 
is smooth, it can be obtained by smooth interpo- 
lation of a suitable function on A^, 



2° = A'e' 



(3) 



That is, gO = X^xeAi '^Ix^x with Al^ which de- 
pends smoothly on z. Now define a restriction 
operator such that C^A^ = 1. Then @ can 
be inverted, — C^e" . Applying to both 
sides of (H) yields an equation for e^. 



(4) 



with — C'^r'^ and the effective operator Di = 
C^DqA^. Given e^, one obtains e° from (^), and 
^0 -I- is an improved solution of (^. Thus, the 
problem has been reduced to an equation on the 
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lattice which has fewer points. If necessary, 
one repeats the procedure, moving to etc. The 
procedure stops, because an equation on a "lat- 
tice" A-*^ with only a single point is easy to solve. 

The iterated interpolation ^1°^! =A^A'^... A' 
from A^ to A° should yield functions on A*^ which 
are smooth on length scale a^, i.e. which change 
little over a distance (in the ordered case). For 
reasons of practicality, one must require A-'^,^ = 
unless z is near x. 

3. SMOOTHNESS AND DISORDER 

A successful MG scheme, whether determinis- 
tic or stochastic, needs smooth interpolation ker- 
nels A. Thus we may ask: Which functions are 
smooth in the disordered situation, for instance 
in an external gauge field? 

A (gauge covariant) naive answer is 

(with discretized covariant derivatives V^). By 
definition, the lowest eigenvalue Eq of the nega- 
tive covariant Laplacian —A is not small for dis- 
ordered gauge fields. (It is positive and vanishes 
only for pure gauges.) Therefore there are no 
smooth functions in this case. 

Nevertheless there is an answer to the question, 
assuming a fundamental differential operator 
is specified by the problem (in the stochastic case, 
the Hamiltonian often provides Do): 

A function ^ on is smooth on length scale a 
when III^o^P < in units a = 1. 

We found that a deterministic multigrid which 
employs interpolation kernels ^["^^ from A-' to 
the fundamental lattice A" which are smooth in 
this sense, works for arbitrarily disordered gauge 
fields. When there are no smooth functions in 
this sense at length scale oq, then Dq has no low 
eigenvalue, and there is no CSD and no need for 
MG. 

The above answer appears natural, and the 
"projective MG" of ||,| is in its sphit. But 
to obtain kernels ^i*?^ which are smooth on 
length scale aj, one needs approximate solutions 
of eigenvalue equations 

Do4°il = eo{x)Af^^ (5) 



Since ^["■'1 is required to vanish for z outside a 
neighbourhood of x, the problem involves Dirich- 
let boundary conditions. For large j, A^^^^ will 
have a large support. If there is no degeneracy 
in the lowest eigenvalue, one can use inverse it- 
eration combined with standard relaxation algo- 
rithms for the resulting inhomogeneous equation. 
But this and other standard methods will suf- 
fer from CSD again. Moreover, in the standard 
multigrid setup, one uses basic interpolation ker- 
nels A^ which interpolate from one grid A-' to the 
next finer one. In this case 

(6) 

and (|^) becomes a very complicated set of non- 
linear conditions. Possible solutions are 

(i) Replace (|^) by minimality of a cost functional 

(cp. later). Use neural algorithms to find 
kernels A-' which minimize it. This is still 
under study. 

(ii) Give up factorization (pf) and determine in- 

dependent kernels A™ as solutions of (^) 
by multigrid iteration. This is done suc- 
cessively for j — 1,2,... One uses already 
determined kernels with k < j for up- 

dating ^["■'1 . We found that this works very 
well - cp. sect. ^. 

Method (ii) will need of order L'^lnL storage 
space and L"^ In^ L computational work for a d- 
dimensional system of linear extension L. 

4. CRITERIA FOR OPTIMALITY 

Any iteration to solve (|]) amounts to updating 
steps of the form 

= + (7) 

with the iteration matrix g whose norm governs 
the convergence, and a — {1 ~ g)DQ^. If \\g\\ < 
1, the iteration converges with a relaxation time 
T < — l/ln||£i||. Parameters in the algorithm - 
such as operators Al^, C^^ and Dj - are optimal 
if the cost functional -E = H^jp is at its minimum. 

As an example, consider a twogrid iteration in 
which a standard relaxation sweep on A" with 
iteration matrix ga is followed by exact solution 
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of the coarse grid equation (j^ . The second step 
leads to an updating with some iteration matrix 
Pi, and g = Qigo- Therefore one may estimate 
E < \\Do gof El with = \\D^' gi\\^ (fine 
grid relaxation smoothens the error but does not 
converge fast - therefore \\Do go\\ is suppressed 
whereas ||po|| is not much smaller than 1) and try 
to optimize the parameters above by minimizing 
Ei: Using the trace norm, WgW^ = tr gg*, one 
finds 

i;i= Volume"^ ^ jT^^P 

with r ^ D^^ - D^^ C\ Prescribing C\ 
and determining Di and by minimizing Ei 
yields what we call the "ideal interpolation ker- 
nel" A],,j. for a given restriction map C^. Since 
it has exponential tails instead of vanishing for z 
outside a neighbourhood of x, it is impractical for 
production runs, though 

5. NEURAL MULTIGRID (NMG) 

A feed-forward artificial neural network 
(ANN) 01 can perform the computations to 
solve (||) by MG relaxation. 

The nodes ("neurons") of the NMG are iden- 
tified with points of the MG as shown in Fig. 0. 
The resulting NMG consists of two copies of the 
same MG, except that the last layer is not du- 
plicated. In the standard MG approach, the ba- 
sic interpolation kernels A-' interpolate from one 
layer to the preceding one, A^~^. Each node is 
connected to some of the nodes in the preceding 
layer. In the upper half, the connection strength 
from X E to z E A^~^ is Al^^. In the lower 
half, node z G A-'^^ is connected to x S A^ with 
strength R^^. In addition there is a connection 
of strength ojjdj^ between the two nodes which 
represent the same point z in A^ (j < TV). 

According to Hebb's hypothesis of synaptical 
learning, a biological neural network learns by ad- 
justing the strength of its synaptical connections. 

The network receives as input an approximate 
solution ^ of (2.1), from which the residual r" = 
/° — DqC is then determined. It computes as out- 
put an improved solution O = £. + S^. The desired 
output ("target") is C = -Dq^V"- 

is a linear 




Figure 1. A feed- forward NMG architecture. 



function of r*^. Except on the bottom layer, each 
node receives as input a weighted sum of the out- 
put of those nodes below it in the diagram to 
which it is connected. The weights are given by 
the connection strengths. Our neurons are linear 
because our problem is linear. The output of each 
neuron is a linear function of the input. 
The result of the computation is 

5^ = (CO do ^ + E-^'"" '^k' fi''"') (8) 

k>l 

where = R^.M^R^ and ^["'^1 = A^A^...A''. 
The operators and Dj (j > 0) , and the damp- 
ing parameters ujj {j > 0) are not needed sep- 
arately since they only enter in the combination 
i?i = CHl - ujj-iDj-id~\). The fundamental 
differential operator Do and its diagonal part do 
are furnished as part of the problem. The connec- 
tion strengths ("synaptical strengths") Ai^, Rl^ 
(and possibly ojq) need to be found by a learn- 
ing process in such a way that the actual out- 
put is as close as possible to the desired output. 
In supervised learning 0|, pairs (C,C'') ("train- 
ing patterns") are presented to an ANN. Given 
input ^''j the actual output O^^ is compared to 
the target C'', and the connection strengths are 
adjusted in such a way that the cost functional 
E — J2f_i II - C^P gets minimized. An iter- 
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ative procedure to achieve this minimization is 
called learning rule. 

Taking for the sequence a complete or- 
thonormal system of functions on A*^, in the limit 
/'^ ^ 0, the target is C'^ = for any input, and 
the output O'^ = g by (0). The learning rule 
for the resulting cost functional 

E = J2UC\\'^ty gg, = \\Q\\^ I min 

is our previous optimality condition for multigrid 
relaxation in sect. ^ 

6. LEARNING RULE PERFORMANCE 

The variant (ii) in sect. ^ involves a slightly dif- 
ferent NMG. Instead of the connections between 
neighbouring layers of the multigrid, we now 
have connections from A° to A'^' with strength 
Ct°^ , and from A^ to A" with strength A^zx^ . If 
we adopt variational coarsening, all connection 
strengths are determined by interpolation kernels 
_4[ofe] which have to be learned. The damping fac- 
tors ujk were set to 1 and dk is the diagonal part 
of Dk as before, with 

Dk = ^[°^-l*Do^'°*''' • (9) 

The learning rule (ii) requires a process of "hard 
thinking" by the NMG. Nodes which have learned 
their lesson already - i.e. which have their con- 
nection strengths fixed - are used to instruct the 
rest of the neural net, adjusting the strengths of 
the next layer of nodes in the NMG. 

A variant of this algorithm was tested in 2 di- 
mensions, using SU(2)-gauge fields which were 
equilibrated with standard Wilson action at var- 
ious values of (3, and Do = —A — £o + Sm'^. eq is 
the lowest eigenvalue of the covariant Laplacian 
—A, and 5m^ > 0. Conventional relaxation al- 
gorithms for solving suffer from CSD for such 
Do I for any volume and small 5m^. 

It turned out that it was not necessary to find 
accurate solutions of the eigenvalue equation for 
the interpolation kernels A^^-'^ . An approximation 
A^x^ to (— A)~"(5za; was computed by multigrid 
iteration. It does not depend on Sm"^. Updating 
^ at x € A^ changes ^ by 



The convergence rate (in units of MG iterations) 
of the ^-iteration is shown in Fig. || for /3 = 1.0. 
One MG iteration involved one sweep (in checker- 
board fashion) through each MG layer, starting 
with j — 0. 
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Figure 2. Correlation time r as function of the 
lowest eigenvalue Sm^ in a representative gauge 
field configuration equilibrated at (3 = 1.0. For 
the 64^ lattice, r fluctuates very little with the 
gauge field. 
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